Active Learning: A Visual Tour

Zeel B Patel, IIT Gandhinagar, patel_zeel@iitgn.ac.in

Nipun Batra, IIT Gandhinagar, nipun.batra@iitgn.ac.in

Rise of Supervised Learning

  • Machine learning has entered almost all the fields (Natural Language Processing (NLP), Computer-aided diagnosis, Optimization, and Bioinformatics)
  • Majority of this success goes to Supervised learning
  • Supervised learning needs labeled data

Data Annotation is Expensive

Speech Recognition

Human Activity Recognition

All the Samples are Not Equally Important

SVC Says: Closer is Better

Confusion in Digit Classification

GP Needs 'Good' Data Points

The Basics of Active Learning

Random Baseline

An ML model can randomly sample datapoints and send them to the oracle for labeling. Random sampling will also eventually result in capturing the global distribution of the dataset in the train datapoints. However, active learning aims to improve the model by intelligently selecting the datapoints for labeling. Thus, Random sampling is an appropriate baseline to compare with active learning.

Different Scenarios for Active Learning

  1. Membership Query Synthesis: model has an underlying distribution of data points from where it can generate the samples. The generated samples are sent to the oracle for labeling.
  2. Stream-Based Selective Sampling: We have a live stream of online data samples, and for each incoming sample model can choose to query for it or discard it based on some criteria.
  3. Pool-Based Sampling: In this case, we already have a pool of unlabeled samples (We called them potential train points in the prior discussion). Based on some criteria, model queries for a few samples.

Pool-Based Sampling

  1. Uncertainty Sampling: We query the samples based on the model's uncertainty about the predictions.
  2. Query by Committee: In this approach, we create a committee of two or more models. The Committee queries for the samples where predictions disagree the most among themselves.

Uncertainty Sampling

Digit Classification with MNIST Dataset

  1. Least confident: In this method, we choose samples for which the most probable class's probability is minimum.

  2. Margin sampling: In this method, we choose samples for which the difference between the probability of the most probable class and the second most probable class is minimum.

  3. Entropy: Entropy can be calculated for N number of classes using the following equation, where $P(x_i)$ is predicted probability for $i^{th}$ class. \begin{equation} H(X) = -\sum\limits_{i=0}^{N}P(x_i)log_2P(x_i) \end{equation}

Out[37]:

Regression on Noisy Sine Curve

Out[44]:

Query by Committee (QBC)

  1. Same model with different hyperparameters
  2. Same model with different segments of the dataset
  3. Different models with the same dataset

Classification on Iris Dataset

Out[52]:

Separation boundaries between different colors are decision boundaries in Animation 3. Points queried by the committee are the points where the learners disagree the most. This can be observed from the above plot. We can see that initially, all models learn different decision boundaries for the same data. Iteratively they converge to a similar hypothesis and thus start learning similar decision boundaries.

We now show the comparison of the overall F1-score between random sampling and our model. QBC, most of the time, outperforms the random sampling method.

Comparison between Uncertainty sampling and QBC

  • For uncertainty sampling, we will use the Random Forest classifier.
  • For QBC, let us use three different classifiers (Random Forest Classifier, Logistic Regression , and Support Vector Classifier)
Out[55]:

How many samples to query at once?

Few More Active Learning Strategies

  1. Expected model change: Selecting the samples that would have the most significant change in the model.
  2. Expected error reduction: Selecting the samples likely to reduce the generalization error of the model.
  3. Variance reduction: Selecting samples that may help reduce output variance.

Thank you